Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPL: Handle ENOENT errors during socket cleanup #10746

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DieterReinert
Copy link
Contributor

This PR updates the REPL socket cleanup logic to safely handle ENOENT (no such file or directory) errors when attempting to remove stale socket files. Previously, if a socket file did not exist at cleanup time, the process would crash. Now, ENOENT errors are caught and ignored, ensuring stable cleanup behavior. Any other unexpected filesystem errors will still be logged, allowing for better reliability and easier debugging of file-related issues.

This PR updates the REPL socket cleanup logic to safely handle ENOENT (no such file or directory) errors when attempting to remove stale socket files. Previously, if a socket file did not exist at cleanup time, the process would crash. Now, ENOENT errors are caught and ignored, ensuring stable cleanup behavior. Any other unexpected filesystem errors will still be logged, allowing for better reliability and easier debugging of file-related issues.
@DieterReinert
Copy link
Contributor Author

DieterReinert commented Dec 10, 2024

If we were to remove net.connect and make the function fully synchronous. It would look like the following:

cleanup() {
	const config = typeof Config !== 'undefined' ? Config : {};
	if (!config.repl) return;

	// Clean up old REPL sockets.
	const directory = path.dirname(
		path.resolve(FS.ROOT_PATH, config.replsocketprefix || 'logs/repl', 'app')
	);

	let files;
	try {
		files = fs.readdirSync(directory);
	} catch {
		// If the directory can't be read, do nothing.
		return;
	}

	if (files) {
		for (const file of files) {
			const pathname = path.resolve(directory, file);
			let stat;
			try {
				stat = fs.statSync(pathname);
			} catch {
				// If we can't stat the file, just move on.
				continue;
			}

			if (!stat.isSocket()) continue;

			// Attempt to remove the socket file synchronously.
			try {
				fs.unlinkSync(pathname);
			} catch (err: any) {
				// If the file doesn't exist, there's nothing to remove.
				// Only log the error if it's something other than 'ENOENT'.
				if (err.code !== 'ENOENT') {
					console.error(`Failed to remove stale socket at ${pathname}:`, err);
				}
			}
		}
	}
}

From what I can gather, the original functionality of net.connect() was to distinguish between sockets that need to be cleaned up and those that are still in use, preventing accidental removal of active sockets.

The potential risk of removing the net.connect() call is that you can no longer verify whether a given socket is still in use. As a result, you might end up removing a socket file that an active process is currently relying on, potentially causing disruptions or errors for that process.

A fully async implementation could look like the following:

cleanup() {
	const config = typeof Config !== 'undefined' ? Config : {};
	if (!config.repl) return;

	const directory = path.dirname(
		path.resolve(FS.ROOT_PATH, config.replsocketprefix || 'logs/repl', 'app')
	);

	let files: string[];
	return fs.promises.readdir(directory).then(f => {
		files = f;
		const promises = files.map(file => {
			const pathname = path.resolve(directory, file);
			return fs.promises.stat(pathname).then(stat => {
				if (!stat.isSocket()) return;
				const socket = net.connect(pathname);
				return new Promise<void>((resolve) => {
					socket.on('connect', () => {
						// Socket is active; just close and move on.
						socket.end();
						socket.destroy();
						resolve();
					});
					socket.on('error', async () => {
						// Socket is stale; remove it.
						try {
							await fs.promises.unlink(pathname);
						} catch (err: any) {
							if (err.code !== 'ENOENT') {
								console.error(`Failed to remove stale socket at ${pathname}:`, err);
							}
						}
						resolve();
					});
				});
			}).catch(() => {
				// If stat fails or something else goes wrong, skip this file.
			});
		});

		return Promise.all(promises);
	}).catch(() => {
		// If the directory can't be read or something else fails, do nothing.
	});
}

The asynchronous version is more selective, only removing confirmed stale sockets, while the synchronous version removes all socket files it encounters.

Could you take a look at this @Zarel ?

@DieterReinert
Copy link
Contributor Author

I haven't had a chance to look deeper. The above is a small brain dump. I will try looking into it tonight or weekend.

@Slayer95
Copy link
Contributor

That sounds about right. That should matter when a previous PS process somehow became unresponsive, and there's urgency in getting another PS instance up and running, but still leaving leeway for debugging the old process.

If we were to take the async route (or even if we don't, I guess), I'd suggest using fs.Dirent

@DieterReinert
Copy link
Contributor Author

I updated the snippets using your suggestion to use fs.Dirent:
Async:

cleanup() {
    const config = typeof Config !== 'undefined' ? Config : {};
    if (!config.repl) return;

    const directory = path.dirname(
        path.resolve(FS.ROOT_PATH, config.replsocketprefix || 'logs/repl', 'app')
    );

    try {
        // Read directory entries with their types
        const dirents = await fs.promises.readdir(directory, { withFileTypes: true });

        // Filter out entries that are sockets
        const socketDirents = dirents.filter(dirent => dirent.isSocket());

        // Map each socket entry to a promise that handles its cleanup
        const cleanupPromises = socketDirents.map(dirent => {
            const pathname = path.resolve(directory, dirent.name);
            return handleSocket(pathname);
        });

        // Wait for all cleanup operations to complete
        await Promise.all(cleanupPromises);
    } catch (err) {
        // Handle errors related to reading the directory
        // For example, log them or silently ignore as per original logic
        // Here, we're choosing to silently ignore to match original behavior
    }
}

handleSocket(pathname: string): Promise<void> {
    try {
        await fs.promises.stat(pathname); // Optional: Additional check if needed

        return new Promise<void>((resolve) => {
            const socket = net.connect(pathname);

            socket.on('connect', () => {
                // Socket is active; just close and resolve
                socket.end();
                socket.destroy();
                resolve();
            });

            socket.on('error', async () => {
                // Socket is stale; attempt to remove it
                try {
                    await fs.promises.unlink(pathname);
                } catch (err: any) {
                    if (err.code !== 'ENOENT') {
                        console.error(`Failed to remove stale socket at ${pathname}:`, err);
                    }
                }
                resolve();
            });
        });
    } catch (err) {
        // If stat fails or something else goes wrong, skip this socket
    }
}

Sync:

cleanup() {
    const config = typeof Config !== 'undefined' ? Config : {};
    if (!config.repl) return;

    // Determine the directory containing REPL sockets.
    const directory = path.resolve(
        FS.ROOT_PATH,
        config.replsocketprefix || 'logs/repl',
        'app'
    );

    let dirents;
    try {
        // Read the directory entries with file types.
        dirents = fs.readdirSync(directory, { withFileTypes: true });
    } catch (err) {
        // If the directory can't be read (e.g., it doesn't exist), exit gracefully.
        if (err.code !== 'ENOENT') {
            console.error(`Error reading directory ${directory}:`, err);
        }
        return;
    }

    // Iterate over each directory entry.
    for (const dirent of dirents) {
        // Check if the entry is a socket.
        if (dirent.isSocket()) {
            const pathname = path.join(directory, dirent.name);

            try {
                // Attempt to remove the socket file synchronously.
                fs.unlinkSync(pathname);
                console.log(`Removed stale socket: ${pathname}`);
            } catch (err) {
                // If the file doesn't exist, there's nothing to remove.
                // Only log the error if it's something other than 'ENOENT'.
                if (err.code !== 'ENOENT') {
                    console.error(`Failed to remove stale socket at ${pathname}:`, err);
                }
            }
        }
    }
}

Anyone have any thoughts? What we should do best?

@DieterReinert DieterReinert changed the title Handle ENOENT Errors During REPL Socket Cleanup REPL: Handle ENOENT errors during socket cleanup Dec 31, 2024
@DieterReinert
Copy link
Contributor Author

Another error just occurred that should be fixed by this:

The main process crashed: Error: ENOENT: no such file or directory, unlink '/home/ps/main/logs/repl/chat-db'
at Object.unlinkSync (node:fs:1881:3)
at Socket.<anonymous> (/home/ps/main/lib/repl.ts:89:9)
at Socket.emit (node:events:517:28)
at Socket.emit (node:domain:489:12)
at emitErrorNT (node:internal/streams/destroy:151:8)
at emitErrorCloseNT (node:internal/streams/destroy:116:3)
at processTicksAndRejections (node:internal/process/task_queues:82:21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants